Skip to content

refact: refactor format backends into dpdata.formats#970

Draft
njzjz-bot wants to merge 4 commits intodeepmodeling:masterfrom
njzjz-bot:oc-fix-pr-946-ci
Draft

refact: refactor format backends into dpdata.formats#970
njzjz-bot wants to merge 4 commits intodeepmodeling:masterfrom
njzjz-bot:oc-fix-pr-946-ci

Conversation

@njzjz-bot
Copy link
Copy Markdown
Contributor

@njzjz-bot njzjz-bot commented May 5, 2026

This PR fixes the CI failures from #946 after moving implementation modules.

Changes:

  • Keep real format backends under dpdata.formats.
  • Move the non-format MD analysis/tool modules back to dpdata.md instead of dpdata.formats.md.
  • Do not preserve dpdata.lammps / dpdata.vasp as top-level exports.
  • Add explicit package exports for the newly moved format subpackages under dpdata.formats.
  • Update direct helper imports in tests/internal code to their new locations:
    • dpdata.formats.cp2k.cell.cell_to_low_triangle
    • dpdata.formats.gaussian.gjf.detect_multiplicity
    • dpdata.formats.qe.traj.convert_celldm
    • dpdata.formats.amber.md.cell_lengths_angles_to_cell
    • dpdata.md.msd.msd
    • dpdata.md.water.*

Follow-up:

  • Removed the legacy dpdata.<format> wrapper modules that were added earlier; this branch no longer keeps those old import paths alive.

Local checks:

  • cd tests && uv run pytest test_amber_md.py test_cell_to_low_triangle.py test_gaussian_driver.py::TestMakeGaussian::test_detect_multiplicity test_qe_cp_traj.py::TestConverCellDim test_msd.py test_water_ions.py -q → 45 passed
  • uv run pyright → currently reports 2 pre-existing missing _version / __version__ diagnostics in dpdata/__init__.py and dpdata/cli.py
  • git grep -n "from dpdata\.formats\..* import \*\|legacy" -- dpdata tests → no matches

Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)

Summary by CodeRabbit

Release Notes

  • Refactor
    • Reorganized internal format handler modules into a dedicated formats subdirectory for improved code structure and maintainability.
    • Updated internal import paths throughout the codebase to reflect the new module organization structure.

@dosubot dosubot Bot added size:M This PR changes 30-99 lines, ignoring generated files. deepmd DeePMD-kit format dpdata labels May 5, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 5, 2026

Merging this PR will improve performance by 22.65%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 2 improved benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime test_import 11.1 ms 9.6 ms +15.37%
WallTime test_cli 369.9 ms 301.6 ms +22.65%

Comparing njzjz-bot:oc-fix-pr-946-ci (d6252fc) with master (6cdc360)

Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 98.27586% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.75%. Comparing base (6cdc360) to head (d6252fc).

Files with missing lines Patch % Lines
dpdata/bond_order_system.py 75.00% 1 Missing ⚠️
dpdata/plugins/amber.py 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #970      +/-   ##
==========================================
+ Coverage   86.73%   86.75%   +0.01%     
==========================================
  Files          86       89       +3     
  Lines        8084     8093       +9     
==========================================
+ Hits         7012     7021       +9     
  Misses       1072     1072              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

📝 Walkthrough

Walkthrough

This PR centralizes format-related modules under a new dpdata.formats package and updates import paths across the codebase to reference dpdata.formats.*. Package initializers for cp2k, gaussian, and qe were added and many relative imports adjusted for the deeper package nesting.

Changes

Module reorganization into dpdata.formats (single cohesive DAG)

Layer / File(s) Summary
Package initializers & top-level
dpdata/formats/__init__.py, dpdata/formats/cp2k/__init__.py, dpdata/formats/gaussian/__init__.py, dpdata/formats/qe/__init__.py, dpdata/__init__.py
Added formats package comment and created cp2k/gaussian/qe package inits (with __all__); dpdata/__init__py now re-exports only the intended top-level names (keeps md, System, LabeledSystem, MultiSystems, BondOrderSystem, __version__).
Format internal imports
dpdata/formats/... (abacus/scf.py, abacus/stru.py, gaussian/fchk.py, gaussian/log.py, gromacs/gro.py, openmx/omx.py, pwmat/atomconfig.py, pwmat/movement.py, qe/traj.py, cp2k/output.py, ...)
Updated relative imports inside format modules to account for deeper nesting (e.g., ..unit...unit, ..periodic_table...periodic_table).
Core system / bond-order wiring
dpdata/bond_order_system.py, dpdata/system.py
Switched RDKit, PBC and Amber mask utility imports to dpdata.formats.* and updated corresponding internal calls (e.g., system_data_to_mol, mol_to_system_data, dir_coord).
Plugin wiring (imports & call sites)
dpdata/plugins/* (abacus, amber, cp2k, deepmd, dftbplus, fhi_aims, gaussian, gromacs, lammps, lmdb, openmx, orca, psi4, pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz, 3dmol, ...)
Repointed plugin imports and format helper calls from dpdata.<format> to dpdata.formats.<format> and adjusted call targets accordingly (no signature or control-flow changes).
Large-format adapters & misc
dpdata/plugins/deepmd.py, dpdata/plugins/lmdb.py, dpdata/formats/...
DeepMD and HDF5 handling now route through dpdata.formats.deepmd.*; LMDB import updated to dpdata.formats.lmdb.format; related adapters updated similarly.
Tests updated
tests/* (context.py, test_abacus_stru_dump.py, test_lammps_lmp_dump.py, test_lammps_spin.py, test_lmdb.py, test_cell_to_low_triangle.py, test_gaussian_driver.py, test_msd.py, test_qe_cp_traj.py, test_water_ions.py, ...)
Updated test imports and call sites to use dpdata.formats.* equivalents; test logic and assertions remain unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

size:XL, lgtm

Suggested reviewers

  • njzjz
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.84% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: refactoring format backends into the dpdata.formats package, which is the primary objective across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dpdata/bond_order_system.py (1)

81-91: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

data-only initialization still raises ValueError due to branch structure

When data is provided without file_name/rdkit_mol, Line 81 initializes from data, but control still reaches the else at Line 91 and raises. This breaks the documented data init path.

Proposed fix
-        if data:
+        if data is not None:
             mol = dpdata.formats.rdkit.utils.system_data_to_mol(data)
             self.from_rdkit_mol(mol)
-        if file_name:
+        elif file_name:
             self.from_fmt(
                 file_name, fmt, type_map=type_map, begin=begin, step=step, **kwargs
             )
-        elif rdkit_mol:
+        elif rdkit_mol is not None:
             self.from_rdkit_mol(rdkit_mol)
         else:
             raise ValueError("Please specify a mol/sdf file or a rdkit Mol object")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/bond_order_system.py` around lines 81 - 91, The init path processes
`data` but then continues into the file/rdkit branches and hits the final else,
causing the erroneous ValueError; update the conditional flow in the
BondOrderSystem initializer (or the method handling construction) so that the
`data` case stops further branching—either change the `if data:` block to `if
data: ... elif file_name: ... elif rdkit_mol: ... else: ...` or keep `if data:`
and add an immediate return after calling from_rdkit_mol; ensure you reference
the existing calls to dpdata.formats.rdkit.utils.system_data_to_mol,
self.from_rdkit_mol, and self.from_fmt when making the change.
dpdata/plugins/pymatgen.py (1)

72-72: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add missing import dpdata.system to fix runtime AttributeError.

Line 72 uses dpdata.system.remove_pbc(data), but dpdata.system is not imported. The dpdata package does not re-export the system module in its __init__.py—only individual classes from it are exposed. This will raise AttributeError at runtime when to_system is called on a PyMatgenMoleculeFormat instance.

Proposed fix — add explicit import
 import dpdata.formats.pymatgen.molecule
 import dpdata.formats.pymatgen.structure
+import dpdata.system
 from dpdata.format import Format
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/plugins/pymatgen.py` at line 72, The call data =
dpdata.system.remove_pbc(data) fails at runtime because the dpdata.system
submodule isn't imported; add an explicit import (e.g., import dpdata.system) at
the top of the file and then leave the call in PyMatgenMoleculeFormat.to_system
as-is so dpdata.system.remove_pbc(data) resolves correctly.
🧹 Nitpick comments (1)
dpdata/plugins/xyz.py (1)

11-14: ⚡ Quick win

Runtime imports placed after the if TYPE_CHECKING: block — invert the order.

Lines 13–14 are unconditional runtime imports but appear after the if TYPE_CHECKING: guard. The conventional (and ruff/isort-expected) layout places the if TYPE_CHECKING: block last among all imports. This ordering may trigger an I001 violation depending on the project's ruff configuration.

♻️ Proposed fix
+from dpdata.formats.xyz.quip_gap_xyz import QuipGapxyzSystems, format_single_frame
+from dpdata.formats.xyz.xyz import coord_to_xyz, xyz_to_coord
+
 if TYPE_CHECKING:
     from dpdata.utils import FileType
-from dpdata.formats.xyz.quip_gap_xyz import QuipGapxyzSystems, format_single_frame
-from dpdata.formats.xyz.xyz import coord_to_xyz, xyz_to_coord

As per coding guidelines, dpdata/**/*.py files must pass ruff check dpdata/ before committing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/plugins/xyz.py` around lines 11 - 14, The import ordering is wrong:
move the TYPE_CHECKING block so it comes after the runtime imports (i.e., place
the "if TYPE_CHECKING: from dpdata.utils import FileType" block below the
imports of QuipGapxyzSystems, format_single_frame, coord_to_xyz, and
xyz_to_coord) to satisfy ruff/isort expectations and avoid I001; ensure the
runtime symbols QuipGapxyzSystems, format_single_frame, coord_to_xyz, and
xyz_to_coord remain imported unconditionally and only FileType is guarded by
TYPE_CHECKING.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@dpdata/__init__.py`:
- Line 4: Run ruff check and fix the reported lint issues: sort the __all__
lists in dpdata/__init__ and dpdata/formats/deepmd/hdf5.py, replace mutable
default args/attributes in functions/classes (e.g., in ase_calculator.py,
rdf.py, water.py) with None and set defaults inside the function or __init__,
add explicit stacklevel=2 to all warnings.warn calls, rename variables shadowing
builtins (e.g., in driver.py, hdf5.py and pwmat-related files), remove or use
unused loop control variables (abacus, cp2k, fhi_aims, gromacs, lammps, md,
openmx, pwmat modules) or replace with _ if intentionally unused, add strict=...
to zip() calls, and address the remaining RUF/B/BLE/etc. issues (unused
unpacking, unnecessary conversions/concatenations, empty abstract methods,
assert False, missing shebangs, ambiguous names, overly broad exception
handlers) as indicated by ruff to make the codebase clean.

In `@dpdata/plugins/openmx.py`:
- Line 64: The unpacked variable `cs` from the call to
dpdata.formats.openmx.omx.to_system_data(fname, mdname) is unused and causes a
Ruff RUF059 lint error; update the unpack to either capture the unused value as
`_` (e.g., `data, _ = ...`) or assign only `data` (e.g., `data = ...`) inside
openmx.py where the call occurs, and then run ruff check dpdata/ and ruff format
dpdata/ to ensure linting/formatting compliance.

---

Outside diff comments:
In `@dpdata/bond_order_system.py`:
- Around line 81-91: The init path processes `data` but then continues into the
file/rdkit branches and hits the final else, causing the erroneous ValueError;
update the conditional flow in the BondOrderSystem initializer (or the method
handling construction) so that the `data` case stops further branching—either
change the `if data:` block to `if data: ... elif file_name: ... elif rdkit_mol:
... else: ...` or keep `if data:` and add an immediate return after calling
from_rdkit_mol; ensure you reference the existing calls to
dpdata.formats.rdkit.utils.system_data_to_mol, self.from_rdkit_mol, and
self.from_fmt when making the change.

In `@dpdata/plugins/pymatgen.py`:
- Line 72: The call data = dpdata.system.remove_pbc(data) fails at runtime
because the dpdata.system submodule isn't imported; add an explicit import
(e.g., import dpdata.system) at the top of the file and then leave the call in
PyMatgenMoleculeFormat.to_system as-is so dpdata.system.remove_pbc(data)
resolves correctly.

---

Nitpick comments:
In `@dpdata/plugins/xyz.py`:
- Around line 11-14: The import ordering is wrong: move the TYPE_CHECKING block
so it comes after the runtime imports (i.e., place the "if TYPE_CHECKING: from
dpdata.utils import FileType" block below the imports of QuipGapxyzSystems,
format_single_frame, coord_to_xyz, and xyz_to_coord) to satisfy ruff/isort
expectations and avoid I001; ensure the runtime symbols QuipGapxyzSystems,
format_single_frame, coord_to_xyz, and xyz_to_coord remain imported
unconditionally and only FileType is guarded by TYPE_CHECKING.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9d56fd68-bd9e-45f8-b025-b4160b3f00b9

📥 Commits

Reviewing files that changed from the base of the PR and between 99ea3bb and a3ddf7f.

📒 Files selected for processing (99)
  • dpdata/__init__.py
  • dpdata/bond_order_system.py
  • dpdata/formats/__init__.py
  • dpdata/formats/abacus/__init__.py
  • dpdata/formats/abacus/md.py
  • dpdata/formats/abacus/relax.py
  • dpdata/formats/abacus/scf.py
  • dpdata/formats/abacus/stru.py
  • dpdata/formats/amber/__init__.py
  • dpdata/formats/amber/mask.py
  • dpdata/formats/amber/md.py
  • dpdata/formats/amber/sqm.py
  • dpdata/formats/cp2k/__init__.py
  • dpdata/formats/cp2k/cell.py
  • dpdata/formats/cp2k/output.py
  • dpdata/formats/deepmd/__init__.py
  • dpdata/formats/deepmd/comp.py
  • dpdata/formats/deepmd/hdf5.py
  • dpdata/formats/deepmd/mixed.py
  • dpdata/formats/deepmd/raw.py
  • dpdata/formats/dftbplus/__init__.py
  • dpdata/formats/dftbplus/output.py
  • dpdata/formats/fhi_aims/__init__.py
  • dpdata/formats/fhi_aims/output.py
  • dpdata/formats/gaussian/__init__.py
  • dpdata/formats/gaussian/fchk.py
  • dpdata/formats/gaussian/gjf.py
  • dpdata/formats/gaussian/log.py
  • dpdata/formats/gromacs/__init__.py
  • dpdata/formats/gromacs/gro.py
  • dpdata/formats/lammps/__init__.py
  • dpdata/formats/lammps/dump.py
  • dpdata/formats/lammps/lmp.py
  • dpdata/formats/lmdb/__init__.py
  • dpdata/formats/lmdb/format.py
  • dpdata/formats/md/__init__.py
  • dpdata/formats/md/msd.py
  • dpdata/formats/md/pbc.py
  • dpdata/formats/md/rdf.py
  • dpdata/formats/md/water.py
  • dpdata/formats/openmx/__init__.py
  • dpdata/formats/openmx/omx.py
  • dpdata/formats/orca/__init__.py
  • dpdata/formats/orca/output.py
  • dpdata/formats/psi4/__init__.py
  • dpdata/formats/psi4/input.py
  • dpdata/formats/psi4/output.py
  • dpdata/formats/pwmat/__init__.py
  • dpdata/formats/pwmat/atomconfig.py
  • dpdata/formats/pwmat/movement.py
  • dpdata/formats/pymatgen/__init__.py
  • dpdata/formats/pymatgen/molecule.py
  • dpdata/formats/pymatgen/structure.py
  • dpdata/formats/qe/__init__.py
  • dpdata/formats/qe/scf.py
  • dpdata/formats/qe/traj.py
  • dpdata/formats/rdkit/__init__.py
  • dpdata/formats/rdkit/sanitize.py
  • dpdata/formats/rdkit/utils.py
  • dpdata/formats/siesta/__init__.py
  • dpdata/formats/siesta/aiMD_output.py
  • dpdata/formats/siesta/output.py
  • dpdata/formats/vasp/__init__.py
  • dpdata/formats/vasp/outcar.py
  • dpdata/formats/vasp/poscar.py
  • dpdata/formats/vasp/xml.py
  • dpdata/formats/xyz/__init__.py
  • dpdata/formats/xyz/quip_gap_xyz.py
  • dpdata/formats/xyz/xyz.py
  • dpdata/plugins/3dmol.py
  • dpdata/plugins/abacus.py
  • dpdata/plugins/amber.py
  • dpdata/plugins/cp2k.py
  • dpdata/plugins/deepmd.py
  • dpdata/plugins/dftbplus.py
  • dpdata/plugins/fhi_aims.py
  • dpdata/plugins/gaussian.py
  • dpdata/plugins/gromacs.py
  • dpdata/plugins/lammps.py
  • dpdata/plugins/lmdb.py
  • dpdata/plugins/openmx.py
  • dpdata/plugins/orca.py
  • dpdata/plugins/psi4.py
  • dpdata/plugins/pwmat.py
  • dpdata/plugins/pymatgen.py
  • dpdata/plugins/qe.py
  • dpdata/plugins/rdkit.py
  • dpdata/plugins/siesta.py
  • dpdata/plugins/vasp.py
  • dpdata/plugins/xyz.py
  • dpdata/siesta/__init__.py
  • dpdata/system.py
  • dpdata/vasp/__init__.py
  • dpdata/xyz/__init__.py
  • tests/context.py
  • tests/test_abacus_stru_dump.py
  • tests/test_lammps_lmp_dump.py
  • tests/test_lammps_spin.py
  • tests/test_lmdb.py

Comment thread dpdata/__init__.py Outdated
Comment thread dpdata/plugins/openmx.py
@njzjz-bot njzjz-bot force-pushed the oc-fix-pr-946-ci branch 4 times, most recently from 4c16ae8 to fda1705 Compare May 5, 2026 08:18
@njzjz njzjz requested a review from Copilot May 5, 2026 08:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rebases the prior directory reorganization (moving format implementations under dpdata.formats) and updates internal imports/tests accordingly, with the stated goal of restoring backward-compatible access to historically public helper modules/namespaces after the move.

Changes:

  • Relocates/introduces many format implementation modules under dpdata/formats/** (e.g., VASP/LAMMPS/QE/Gaussian/CP2K/LMDB, etc.).
  • Updates plugin modules and some core code to import from dpdata.formats.* instead of historical locations.
  • Adds a lazy attribute-based loader in dpdata/__init__.py for some format modules (cp2k, gaussian, qe).

Reviewed changes

Copilot reviewed 44 out of 99 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
dpdata/__init__.py Switches top-level exports to dpdata.formats.* and adds lazy __getattr__ for some format modules.
dpdata/system.py Updates internal imports to dpdata.formats.md.pbc and dpdata.formats.amber.mask.
dpdata/bond_order_system.py Updates RDKit helpers import path to dpdata.formats.rdkit.*.
dpdata/plugins/3dmol.py Updates XYZ helper import path to dpdata.formats.xyz.xyz.
dpdata/plugins/abacus.py Points ABACUS plugin imports to dpdata.formats.abacus.*.
dpdata/plugins/amber.py Points Amber plugin imports to dpdata.formats.amber.*.
dpdata/plugins/cp2k.py Points CP2K plugin imports to dpdata.formats.cp2k.output.
dpdata/plugins/deepmd.py Points DeePMD plugin imports to dpdata.formats.deepmd.*.
dpdata/plugins/dftbplus.py Points DFTB+ plugin imports to dpdata.formats.dftbplus.output.
dpdata/plugins/fhi_aims.py Points FHI-aims plugin imports to dpdata.formats.fhi_aims.output.
dpdata/plugins/gaussian.py Points Gaussian plugin imports to dpdata.formats.gaussian.* and updates doc references.
dpdata/plugins/gromacs.py Points Gromacs plugin imports to dpdata.formats.gromacs.gro.
dpdata/plugins/lammps.py Points LAMMPS plugin imports to dpdata.formats.lammps.*.
dpdata/plugins/lmdb.py Registers LMDB format via dpdata.formats.lmdb.format.LMDBFormat.
dpdata/plugins/openmx.py Points OpenMX plugin imports to dpdata.formats.openmx.* and dpdata.formats.md.pbc.
dpdata/plugins/orca.py Points ORCA plugin imports to dpdata.formats.orca.output.
dpdata/plugins/psi4.py Points Psi4 plugin imports to dpdata.formats.psi4.*.
dpdata/plugins/pwmat.py Points PWMAT plugin imports to dpdata.formats.pwmat.*.
dpdata/plugins/pymatgen.py Points pymatgen plugin imports to dpdata.formats.pymatgen.*.
dpdata/plugins/qe.py Points QE plugin imports to dpdata.formats.qe.* and dpdata.formats.md.pbc.
dpdata/plugins/rdkit.py Points RDKit plugin imports to dpdata.formats.rdkit.utils.
dpdata/plugins/siesta.py Points SIESTA plugin imports to dpdata.formats.siesta.*.
dpdata/plugins/vasp.py Points VASP plugin imports to dpdata.formats.vasp.*.
dpdata/plugins/xyz.py Points XYZ plugin imports to dpdata.formats.xyz.*.
dpdata/formats/__init__.py Introduces the dpdata.formats package marker.
dpdata/formats/abacus/md.py Adds/relocates ABACUS MD reader under formats.
dpdata/formats/abacus/relax.py Adds/relocates ABACUS relax reader under formats.
dpdata/formats/abacus/scf.py Adjusts imports for ABACUS SCF under formats.
dpdata/formats/abacus/stru.py Adjusts imports for ABACUS STRU under formats.
dpdata/formats/amber/mask.py Adds/relocates Amber mask utilities under formats.
dpdata/formats/amber/md.py Adjusts imports for Amber MD under formats.
dpdata/formats/amber/sqm.py Adds/relocates SQM parsing/input generation under formats.
dpdata/formats/cp2k/cell.py Adds/relocates CP2K cell helper under formats.
dpdata/formats/cp2k/output.py Adjusts imports for CP2K output reader under formats.
dpdata/formats/deepmd/comp.py Adds/relocates deepmd/npy (“comp”) support under formats.
dpdata/formats/deepmd/hdf5.py Adds/relocates deepmd/hdf5 support under formats.
dpdata/formats/deepmd/mixed.py Adds/relocates deepmd mixed-type utilities under formats.
dpdata/formats/deepmd/raw.py Adds/relocates deepmd/raw support under formats.
dpdata/formats/dftbplus/output.py Adds/relocates DFTB+ output reader under formats.
dpdata/formats/fhi_aims/output.py Adds/relocates FHI-aims output reader under formats.
dpdata/formats/gaussian/fchk.py Adjusts relative imports for Gaussian fchk under formats.
dpdata/formats/gaussian/gjf.py Adds/relocates Gaussian input generator/parser under formats.
dpdata/formats/gaussian/log.py Adjusts relative imports for Gaussian log under formats.
dpdata/formats/gromacs/gro.py Adjusts relative imports for Gromacs gro under formats.
dpdata/formats/lammps/dump.py Adds/relocates LAMMPS dump parsing/writing under formats.
dpdata/formats/lammps/lmp.py Adds/relocates LAMMPS data-file parsing/writing under formats.
dpdata/formats/lmdb/format.py Adds/relocates LMDB format implementation under formats.
dpdata/formats/md/msd.py Adds/relocates MSD implementation under formats.
dpdata/formats/md/pbc.py Adds/relocates PBC utilities under formats.
dpdata/formats/md/rdf.py Adds/relocates RDF implementation under formats.
dpdata/formats/md/water.py Adds/relocates water analysis utilities under formats.
dpdata/formats/openmx/omx.py Adjusts relative imports for OpenMX under formats.
dpdata/formats/orca/output.py Adds/relocates ORCA output reader under formats.
dpdata/formats/psi4/input.py Adds/relocates Psi4 input writer under formats.
dpdata/formats/psi4/output.py Adds/relocates Psi4 output reader under formats.
dpdata/formats/pwmat/atomconfig.py Adjusts relative imports for PWMAT atomconfig under formats.
dpdata/formats/pwmat/movement.py Adjusts relative imports for PWMAT movement under formats.
dpdata/formats/pymatgen/molecule.py Adds/relocates pymatgen Molecule conversion under formats.
dpdata/formats/pymatgen/structure.py Adds/relocates pymatgen Structure conversion under formats.
dpdata/formats/qe/scf.py Adds/relocates QE SCF parsing under formats.
dpdata/formats/qe/traj.py Fixes relative imports within QE traj under formats.
dpdata/formats/rdkit/utils.py Adds/relocates RDKit helper utilities under formats.
dpdata/formats/siesta/aiMD_output.py Adds/relocates SIESTA aiMD output reader under formats.
dpdata/formats/siesta/output.py Adds/relocates SIESTA output reader under formats.
dpdata/formats/vasp/outcar.py Adds/relocates VASP OUTCAR parsing under formats.
dpdata/formats/vasp/poscar.py Adds/relocates VASP POSCAR parsing/writing under formats.
dpdata/formats/vasp/xml.py Adds/relocates VASP XML parsing under formats.
dpdata/formats/xyz/quip_gap_xyz.py Adds/relocates QUIP/GAP XYZ support under formats.
dpdata/formats/xyz/xyz.py Adds/relocates basic XYZ conversions under formats.
tests/context.py Updates import smoke-loading to dpdata.formats.*.
tests/test_abacus_stru_dump.py Updates ABACUS test imports to dpdata.formats.abacus.*.
tests/test_lammps_lmp_dump.py Updates LAMMPS test imports to dpdata.formats.lammps.*.
tests/test_lammps_spin.py Updates LAMMPS test imports to dpdata.formats.lammps.*.
tests/test_lmdb.py Updates LMDB test imports to dpdata.formats.lmdb.*.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dpdata/__init__.py Outdated
Comment thread tests/context.py Outdated
Comment thread tests/test_lammps_spin.py
Comment thread tests/test_lammps_lmp_dump.py
@njzjz-bot njzjz-bot force-pushed the oc-fix-pr-946-ci branch 3 times, most recently from 92a9266 to e23e05e Compare May 5, 2026 09:09
@njzjz-bot njzjz-bot changed the title fix: expose moved format helper modules Fix CI after moving format modules May 5, 2026
@njzjz-bot njzjz-bot force-pushed the oc-fix-pr-946-ci branch 2 times, most recently from 2482d02 to ae36482 Compare May 5, 2026 11:24
@njzjz njzjz linked an issue May 5, 2026 that may be closed by this pull request
@njzjz-bot njzjz-bot changed the title Fix CI after moving format modules Refactor format backends into dpdata.formats May 5, 2026
@njzjz njzjz requested a review from wanghan-iapcm May 5, 2026 16:01
@njzjz njzjz changed the title Refactor format backends into dpdata.formats refact: refactor format backends into dpdata.formats May 5, 2026
@wanghan-iapcm
Copy link
Copy Markdown
Contributor

conflicts should be resolved.

OpenClaw Bot and others added 3 commits May 6, 2026 10:21
Move all format directories (abacus, amber, cp2k, deepmd, dftbplus,
fhi_aims, gaussian, gromacs, lammps, lmdb, md, openmx, orca, psi4,
pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz) into a new formats/
subdirectory.

This addresses issue deepmodeling#934.

Changes:
- Created dpdata/formats/ directory
- Moved all format directories to dpdata/formats/
- Updated all import statements throughout the codebase
- Updated relative imports in format modules (from .. to from ...)
- Updated dpdata/__init__.py to import from new locations
- Updated tests/context.py for new import paths

The plugins directory remains at the root level as requested.
@njzjz-bot njzjz-bot force-pushed the oc-fix-pr-946-ci branch from ae36482 to 9680f7c Compare May 6, 2026 10:31
@dosubot dosubot Bot removed the size:M This PR changes 30-99 lines, ignoring generated files. label May 6, 2026
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 6, 2026
@njzjz njzjz marked this pull request as draft May 7, 2026 04:16
Authored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
@njzjz-bot njzjz-bot force-pushed the oc-fix-pr-946-ci branch from cfdb266 to d6252fc Compare May 7, 2026 05:44
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dpdata/plugins/xyz.py (1)

1-117: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Run mandated Ruff checks before merge.

Please confirm ruff check dpdata/ and ruff format dpdata/ were run for this PR branch before merging.

As per coding guidelines, "Run ruff linting with ruff check dpdata/ and format code with ruff format dpdata/ before committing".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/plugins/xyz.py` around lines 1 - 117, Run the repository
linter/formatter on the dpdata package and commit any fixes: execute "ruff check
dpdata/" to identify issues and "ruff format dpdata/" to auto-format, then
re-run "ruff check dpdata/" to confirm there are no remaining offenses; ensure
changes (especially in the modified symbols/classes like XYZFormat and
QuipGapXYZFormat in dpdata/plugins/xyz.py and any related imports such as
coord_to_xyz, xyz_to_coord, QuipGapxyzSystems, format_single_frame) are staged
and committed before merging.
♻️ Duplicate comments (1)
dpdata/plugins/openmx.py (1)

64-64: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Remove unused unpacked variable to satisfy lint.

The variable cs is assigned but never used, triggering Ruff RUF059. This should use _ instead, matching the pattern already used on Line 38.

🔧 Proposed fix
-        data, cs = dpdata.formats.openmx.omx.to_system_data(fname, mdname)
+        data, _ = dpdata.formats.openmx.omx.to_system_data(fname, mdname)

As per coding guidelines, dpdata/**/*.py: Run ruff linting with ruff check dpdata/ and format code with ruff format dpdata/ before committing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dpdata/plugins/openmx.py` at line 64, The assignment unpacks two values from
dpdata.formats.openmx.omx.to_system_data(fname, mdname) into variables `data,
cs` but `cs` is unused and triggers RUF059; change the unused unpacked variable
`cs` to `_` (i.e., `data, _ = dpdata.formats.openmx.omx.to_system_data(fname,
mdname)`) in the dpdata.plugins.openmx module and then run ruff check/format on
the dpdata/ package as per guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@dpdata/plugins/xyz.py`:
- Around line 1-117: Run the repository linter/formatter on the dpdata package
and commit any fixes: execute "ruff check dpdata/" to identify issues and "ruff
format dpdata/" to auto-format, then re-run "ruff check dpdata/" to confirm
there are no remaining offenses; ensure changes (especially in the modified
symbols/classes like XYZFormat and QuipGapXYZFormat in dpdata/plugins/xyz.py and
any related imports such as coord_to_xyz, xyz_to_coord, QuipGapxyzSystems,
format_single_frame) are staged and committed before merging.

---

Duplicate comments:
In `@dpdata/plugins/openmx.py`:
- Line 64: The assignment unpacks two values from
dpdata.formats.openmx.omx.to_system_data(fname, mdname) into variables `data,
cs` but `cs` is unused and triggers RUF059; change the unused unpacked variable
`cs` to `_` (i.e., `data, _ = dpdata.formats.openmx.omx.to_system_data(fname,
mdname)`) in the dpdata.plugins.openmx module and then run ruff check/format on
the dpdata/ package as per guidelines.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b2ea20e0-3772-4448-8e73-4cff8fb7f254

📥 Commits

Reviewing files that changed from the base of the PR and between 2482d02 and d6252fc.

📒 Files selected for processing (100)
  • dpdata/__init__.py
  • dpdata/bond_order_system.py
  • dpdata/formats/__init__.py
  • dpdata/formats/abacus/__init__.py
  • dpdata/formats/abacus/md.py
  • dpdata/formats/abacus/relax.py
  • dpdata/formats/abacus/scf.py
  • dpdata/formats/abacus/stru.py
  • dpdata/formats/amber/__init__.py
  • dpdata/formats/amber/mask.py
  • dpdata/formats/amber/md.py
  • dpdata/formats/amber/sqm.py
  • dpdata/formats/cp2k/__init__.py
  • dpdata/formats/cp2k/cell.py
  • dpdata/formats/cp2k/output.py
  • dpdata/formats/deepmd/__init__.py
  • dpdata/formats/deepmd/comp.py
  • dpdata/formats/deepmd/hdf5.py
  • dpdata/formats/deepmd/mixed.py
  • dpdata/formats/deepmd/raw.py
  • dpdata/formats/dftbplus/__init__.py
  • dpdata/formats/dftbplus/output.py
  • dpdata/formats/fhi_aims/__init__.py
  • dpdata/formats/fhi_aims/output.py
  • dpdata/formats/gaussian/__init__.py
  • dpdata/formats/gaussian/fchk.py
  • dpdata/formats/gaussian/gjf.py
  • dpdata/formats/gaussian/log.py
  • dpdata/formats/gromacs/__init__.py
  • dpdata/formats/gromacs/gro.py
  • dpdata/formats/lammps/__init__.py
  • dpdata/formats/lammps/dump.py
  • dpdata/formats/lammps/lmp.py
  • dpdata/formats/lmdb/__init__.py
  • dpdata/formats/lmdb/format.py
  • dpdata/formats/openmx/__init__.py
  • dpdata/formats/openmx/omx.py
  • dpdata/formats/orca/__init__.py
  • dpdata/formats/orca/output.py
  • dpdata/formats/psi4/__init__.py
  • dpdata/formats/psi4/input.py
  • dpdata/formats/psi4/output.py
  • dpdata/formats/pwmat/__init__.py
  • dpdata/formats/pwmat/atomconfig.py
  • dpdata/formats/pwmat/movement.py
  • dpdata/formats/pymatgen/__init__.py
  • dpdata/formats/pymatgen/molecule.py
  • dpdata/formats/pymatgen/structure.py
  • dpdata/formats/qe/__init__.py
  • dpdata/formats/qe/scf.py
  • dpdata/formats/qe/traj.py
  • dpdata/formats/rdkit/__init__.py
  • dpdata/formats/rdkit/sanitize.py
  • dpdata/formats/rdkit/utils.py
  • dpdata/formats/siesta/__init__.py
  • dpdata/formats/siesta/aiMD_output.py
  • dpdata/formats/siesta/output.py
  • dpdata/formats/vasp/__init__.py
  • dpdata/formats/vasp/outcar.py
  • dpdata/formats/vasp/poscar.py
  • dpdata/formats/vasp/xml.py
  • dpdata/formats/xyz/__init__.py
  • dpdata/formats/xyz/quip_gap_xyz.py
  • dpdata/formats/xyz/xyz.py
  • dpdata/plugins/3dmol.py
  • dpdata/plugins/abacus.py
  • dpdata/plugins/amber.py
  • dpdata/plugins/cp2k.py
  • dpdata/plugins/deepmd.py
  • dpdata/plugins/dftbplus.py
  • dpdata/plugins/fhi_aims.py
  • dpdata/plugins/gaussian.py
  • dpdata/plugins/gromacs.py
  • dpdata/plugins/lammps.py
  • dpdata/plugins/lmdb.py
  • dpdata/plugins/openmx.py
  • dpdata/plugins/orca.py
  • dpdata/plugins/psi4.py
  • dpdata/plugins/pwmat.py
  • dpdata/plugins/pymatgen.py
  • dpdata/plugins/qe.py
  • dpdata/plugins/rdkit.py
  • dpdata/plugins/siesta.py
  • dpdata/plugins/vasp.py
  • dpdata/plugins/xyz.py
  • dpdata/siesta/__init__.py
  • dpdata/system.py
  • dpdata/vasp/__init__.py
  • dpdata/xyz/__init__.py
  • tests/context.py
  • tests/test_abacus_stru_dump.py
  • tests/test_amber_md.py
  • tests/test_cell_to_low_triangle.py
  • tests/test_gaussian_driver.py
  • tests/test_lammps_lmp_dump.py
  • tests/test_lammps_spin.py
  • tests/test_lmdb.py
  • tests/test_msd.py
  • tests/test_qe_cp_traj.py
  • tests/test_water_ions.py
✅ Files skipped from review due to trivial changes (20)
  • dpdata/formats/pwmat/movement.py
  • dpdata/formats/init.py
  • dpdata/formats/gromacs/gro.py
  • dpdata/formats/abacus/scf.py
  • dpdata/plugins/lmdb.py
  • dpdata/plugins/3dmol.py
  • tests/test_lmdb.py
  • dpdata/formats/gaussian/log.py
  • tests/test_amber_md.py
  • dpdata/formats/cp2k/output.py
  • dpdata/plugins/dftbplus.py
  • dpdata/formats/qe/init.py
  • dpdata/plugins/psi4.py
  • dpdata/formats/gaussian/init.py
  • dpdata/formats/cp2k/init.py
  • tests/test_qe_cp_traj.py
  • dpdata/plugins/cp2k.py
  • dpdata/plugins/lammps.py
  • dpdata/plugins/rdkit.py
  • tests/test_water_ions.py
🚧 Files skipped from review as they are similar to previous changes (21)
  • dpdata/formats/pwmat/atomconfig.py
  • tests/test_lammps_lmp_dump.py
  • dpdata/formats/amber/md.py
  • dpdata/formats/abacus/stru.py
  • dpdata/formats/qe/traj.py
  • tests/test_msd.py
  • dpdata/formats/gaussian/fchk.py
  • dpdata/plugins/orca.py
  • tests/test_cell_to_low_triangle.py
  • tests/context.py
  • dpdata/plugins/fhi_aims.py
  • dpdata/plugins/gromacs.py
  • dpdata/plugins/siesta.py
  • tests/test_abacus_stru_dump.py
  • dpdata/plugins/vasp.py
  • dpdata/formats/openmx/omx.py
  • dpdata/system.py
  • dpdata/plugins/deepmd.py
  • dpdata/plugins/gaussian.py
  • dpdata/plugins/amber.py
  • dpdata/plugins/qe.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepmd DeePMD-kit format dpdata size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Reorganize the directory structure

3 participants